Information Theoretic Model Validation for Spectral Clustering

نویسندگان

  • Morteza Haghir Chehreghani
  • Alberto Giovanni Busetto
  • Joachim M. Buhmann
چکیده

Model validation constitutes a fundamental step in data clustering. The central question is: Which cluster model and how many clusters are most appropriate for a certain application? In this study, we introduce a method for the validation of spectral clustering based upon approximation set coding. In particular, we compare correlation and pairwise clustering to analyze the correlations of temporal gene expression profiles. To evaluate and select clustering models, we calculate their reliable informativeness. Experimental results in the context of gene expression analysis show that pairwise clustering yields superior amounts of reliable information. The analysis results are consistent with the Bayesian Information Criterion (BIC), and exhibit higher generality than BIC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information theoretic model selection in clustering

Model selection in clustering requires (i) to specify a clustering principle and (ii) to decide an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the data set induces an uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of noise in the data th...

متن کامل

Information Theoretic Pairwise Clustering

In this paper we develop an information-theoretic approach for pairwise clustering. The Laplacian of the pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual infor...

متن کامل

A Comparative Study of Spectral Clustering and Information-theoretic Co-clustering for Video Shot Categorization

Automatic categorization of video shots is important in video indexing and retrieval. To improve the effectiveness of video shot categorization, current researchers have addressed two major issues: i) spatio-temporal coherence from shot to shot, and ii) bipartite correlation between descriptive features and shot categories. In recent works, spectral clustering and information-theoretic co-clust...

متن کامل

NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map

Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...

متن کامل

Noise Thresholds for Spectral Clustering

Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012